Index Pruning and Result Reranking: Effects on Ad-Hoc Retrieval and Named Page Finding

نویسندگان

Stefan Büttcher

Charles L. A. Clarke

Peter C. K. Yeung

چکیده

We describe experiments conducted for the TREC 2006 Terabyte track. Our experiments are centered around two concepts: Static index pruning (for increased retrieval efficiency) and result reranking (for improved precision). We investigate their effect on retrieval efficiency and effectiveness, paying special attention to the difference between ad-hoc retrieval and named page finding. We show that index pruning and reranking based on relevance models can be beneficial in an ad-hoc retrieval setting, but have a disastrous repercussion on the effectiveness of named page finding. Result reranking based on anchor text, on the other hand, is very useful for named page finding, but should not be used for ad-hoc retrieval. This dichotomy poses a problem for search engines, as there is no easy way for a search engine to decide whether a given query represents an ad-hoc retrieval task, with the purpose to satisfy an abstract information need, or a named page finding task, targeting a specific document.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

THUIR at TREC 2005 Terabyte Track

IR group of Tsinghua University this year has used its TMiner text retrieval system for indexing and retrieval of the Terabyte track ad hoc and named-page subtasks. In doing the two tasks, we used the in-link anchor texts (the anchor of the URLs that point to the current page in the collection) together with the content texts of the web pages for building the indices. When retrieving, the word-...

متن کامل

National Taiwan University at Terabyte Track of TREC 2005

There are three tasks in the Terabyte track of TREC 2005, i.e. Efficiency, Ad hoc and Named page finding. We participated in all the tasks and use different retrieval methods to deal with each task, aiming to vary the retrieval method according to the different characteristics of different tasks. In Ah hoc task, we adopt the technique of query-specific clustering. In Named page finding task, we...

متن کامل

Finding Task 4 . 1 Motivation and Description of Our Method

There are three tasks in the Terabyte track of TREC 2005, i.e. Efficiency, Ad hoc and Named page finding. We participated in all the tasks and used different retrieval methods to deal with each task, aiming to vary the retrieval method according to the different characteristics of different tasks.

متن کامل

A Comparative Study of Probabalistic and Language Models for Information Retrieval

Language models for information retrieval have received much attention in recent years, with many claims being made about their performance. However, previous studies evaluating the language modelling approach for information retrieval used different query sets and heterogeneous collections, which make reported results difficult to compare. This research is a broad-based study that evaluates la...

متن کامل

A Comparative Study of Probabilistic and Language Models for Information Retrieval

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2006

Index Pruning and Result Reranking: Effects on Ad-Hoc Retrieval and Named Page Finding

نویسندگان

چکیده

منابع مشابه

THUIR at TREC 2005 Terabyte Track

National Taiwan University at Terabyte Track of TREC 2005

Finding Task 4 . 1 Motivation and Description of Our Method

A Comparative Study of Probabalistic and Language Models for Information Retrieval

A Comparative Study of Probabilistic and Language Models for Information Retrieval

عنوان ژورنال:

اشتراک گذاری